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shrinkage estimators more efficacv. 

Generally, the aggregate methods currently in use are 
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I. INTRODUCTION 


A. PURPOSE 

This is a continuation of a pilot study started by Major 
D.D. Tucker in a thesis [Ref. 1] submitted at the Naval 
Postgraduate School in September 1985. The reader is 
referred to Tucker [Ref. 1] for most of the background 
information, including a detailed discussion of the Marine 
Corps officer attrition and promotion structure, the officer 
manpower planning process, and the attrition rate models 
explored by Major Tucker. Only information that is 


immediately pertinent will be repeated. 


B. BACKGROUND 

The United States Marine Corps has about 20,000 offi- 
cers. These can be cross classified into 40 military occupa- 
tional fields (OF), 31 lengths of service (LOS), and 10 
grades, or 12,400 categories for manpower planning purposes. 
About half (6149) of these categories, called hereafter 
cells, are unoccupied for structural reasons, e.g., due to 
policy decisions concerning minimum and maximum lengths of 
service for each grade, and the allowable grades in each OF. 
Estimates of the attrition rates from these cells Support a 
number of Marine Corps models, and accurate prediction of 
the rates is basic to effective manpower utilization. 

The goal of this pilot study is to find efficient ways 
to estimate attrition rates (i.e.,the rate of leaving the 
service, not of changing OF, LOS, or grade) for the officer 
OF/LOS/grade categories. This is a difficult problem 
because of the large number of cells with low inventory 
figures. We will refer to this as the "small cell“ problem; 
it is this small cell problem that is of greatest concern to 


the builders and users of these manpower models. 
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Because of the verv large number of cells and their 
heterogeneous nature, it is wise to collect cells into major 
groups, or aggregates, which conform to certain assumptions 
concerning their statistical behavior. If these assumptions 
are at least approximatelv valid for the aggregates, then 
certain theoretical models can be used to predict attrition 
rates more accuratelv than current practices allow. Amin 
Elseramegy [Ref. 2] explored the aggregation problem using 
the CART routine with encouraging results. But these 
results were not available or usable in a timely fashion to 
be included in the present study. This thesis and Tucker 
[Ref. 1] assume valid aggregations can be found, and explore 
the performance of likely estimation schemes. However, the 
aggregates used in these pilot studies conform to current 
Marine Corps practice. These were selected on grounds that 
conform to organizational and operational considerations, 
and are unlikely to be related to a choice made on the basis 
of the statistical modeling behavior. 

Current Marine Corps practice places all OF's in four 
categories: aviation (OF 72, 75), combat support (OF 13, 
25, 35), ground combat (OF „03, 08, 18), and other. 
Aggregates are formed from these categories by taking data 
by grade. Past attrition rates, from 1977 to the present, 
weighted bv subjective judgement, are used to predict future 
attrition. An average attrition rate (the grand mean) is 
computed for the entire aggregate, and this rate is used as 
an estimate for all cell attrition rates in the aggregate. 
We expect to improve substantially over this method. 

It should be noted that the aviation category used by 
Tucker [Ref. 1] included only OF 75. For continuity, this is 


continued in the present work. 


C. PROGRESS 
Tucker [Ref. 1] showed the  James-Stein shrinkage 
estimator [Refs. 3,4] can 


(1) greatly improve on current methods, 
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(2) improve, in a global sense, over maximum likelihood 
methods, and 


(3) prey age estimates for those small_cells which have 
ad no attrition, i.e., those cells whose MLE must 
equal zero. 

Present work continues this investigation in that 

(1) ena measures of efficacy (risk) are decomposed so 
hat the effects of the method can be separately 
examined for small and large cells, and 

(2) a class of extensions of the James-Stein estimation, 
called limited translation shrinkage estimation, is 
applied to the aggregates studied by Tucker [Ref. 1]. 

The main purpose is to sharpen the treatment given the small 


cells. 


D. RESULTS 

It appears the limited translation technique adds to the 
efficacy of the James-Stein estimates (see Chapter IV), in 
that the estimation of rates for small cell has improved. 
Also, an estimator , designated the transformed scale cell 
average (TSCA) which is a version corresponding to Zero 
James-Stein shrinkage, has been shown to be an efficient 
estimation technique, often outperforming all other schemes 
examined here. The various methods are quite competitive, 
and at this time there is no clear choice. We believe that 
better aggregation methods need to be applied prior to 


attempting to choose among these methods. 


E. ORGANIZATION 

Chapter II contains the details of methodology and nota- 
tion necessary to the present work. A brief summary of 
James-Stein estimation is presented, with emphasis on its 
implementation in the present work. 

Chapter III explains the limited translation extension, 
together with the theoretical curves that help anticipate 
the effect of this option. 

Chapter IV contains the numerical summaries and tabula- 
tion of the figures of merit (FOM) for the various 


techniques. 
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Chapter V thoroughlv discusses the results, including 
recommendations, and lists additional areas needing 
examination. 

The appendices document certain details of interest to 
the reader desiring a greater in-depth knowledge of the 
methods. 
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II. RELATED ESTIMATION METHODS 


A. GENERAL 
This chapter describes the estimation method currently 
in use by the Marine Corps, and four other methods of 


estimation pertinent to the present work. 


B. BACKGROUND 

As explained by Tucker [Ref. 1], the performance of the 
estimation schemes are compared on two scales, transformed 
and original. The transformed scale is the range space of 
the Freeman-Tukey transform (see Appendix B) that helps 
Stabilize the variance of the ordinary empirical rates 
assuming they are described by the binomial model. On this 
scale the transformed quantities are treated as normally 
distributed random variables with common variance. It is in 
this setting that the James-Stein estimator is derived and 
can be expected to perform well. Because the rates are low 
and because many cells are small, we cannot assert with 
confidence that the rates on the transformed scale are 
approximately normal with common variance. The ultimate 
value must be judged in terms of cross-validation, i.e., 
comparing the estimates with like transformed values of 
future actuals. Following Tucker [Ref. 1] we have chosen the 
first four years of data of the seven available to estimate 
rates, and the last three years for validation. 

Although comparisons on the transformed scale are 
valuable for purposes of understanding the behavior of 
shrinkage estimators, they do not supplant the need to study 
behavior on the original scale. Hence, the transformed 
estimators must be inverted to estimated rates on the orig- 
inal scale, and then validated against original scale 


actuals. The traditional chi-square goodness-of-fit 
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statistic was chosen to do this. It is a weighted sum of 
sąuares deviations measure. 

The reguirement to work on a transformed scale intro- 
duces some additional complications. To illustrate the 
point, consider the following choice. Should the empirical 
rate for each cell of the four years be transformed to the 
new scale, or should we first sum the leavers and the inven- 
tory over the four years in order to produce a more stable 
cell rate prior to transforming? It turns out that if the 
latter is chosen then we have no reasonable way to estimate 
the within-group variance on the transformed scale. Hence 
the former is chosen. This done, the transformed guantities 
are averaged over time to produce a single figure for each 
cell prior to shrinkage. (This is the TSCA mentioned in 
Chapter I, Section B.) 

On the other hand, the latter figure is still useful 
since it is the maximum likelihood estimator of the cell 
rate on the original scale. But there is still the guestion 
of how to use it, for comparison purposes, on the trans- 
formed scale. We have chosen to use the four year average 
cell inventory in conjunction with this MLE rate and then 
apply the arcsine transformation. See eguation B.l. 

To avoid confusion, an estimate will always be referred 
to by the name given when initially calculated. For example, 
the maximum likelihood estimate (MLE) is calculated 
initiallv on the original scale, and still will be called 
the MLE when on the transformed scale. Also, the term 
'maximum likelihood'' mav be used to refer to maximum likeli- 
hood estimation in the setting at hand, Thus the TSCA ia a 
set of specific maximum likelihood estimators as it refers 
to the "IID normal with common variance'' setting imposed on 


the transformed scale. 
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C. AGGREGATION 

The current Marine Corps attrition rate estimation meth- 
odology described in Chapter I is an aggregation scheme, and 
as such has several weaknesses. First, a single rate is 
applied to all cells in the aggregate. This does not take 
into account actual differences in cell inventory or losses. 
The pattern of losses can differ greatly among cells in an 
aggregate whose composition is arbitrarily determined. 

Also, an aggregation scheme has difficulty handling 
cells that have had zero inventory for the estimation 
period, i.e., cells whose MLE would be zero. The application 
of the aggregate rate in this case is clearly an 
overestimate. 

Two constant rates are used in the present work for 
comparison purposes. The first is the aggregate rate calcu- 
lated on the original scale, called hereafter the original 
scale aggregate. This rate is the total losses divided by 
the total inventory, to be applied to all cells in the 
aggregate, and is a single number. For comparison on the 
transformed scale, this rate is mapped into the transformed 


Space using the arcsine transformation 


X; = (N;+0.5)#sin 1(2p-1), izl, E ( 22889) 
where p is the aforementioned single rate, and N, area 
average cell inventories over time. This results in 
different transformed cell means (transformed rates) because 
of differing cell inventories. Generally the subscript i 
indexes a combination of LOS and OF. 

The second rate is calculated on the transformed scale, 
and is called hereafter the transform aggregate. It is 
computed by averaging the cell figures that result from 
applying the the Freeman-Tukey transformation. On the trans- 


formed scale this is a single number and in fact is the 
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grand mean of the TSCA figures. When inverted onto the 
original scale for comparison purposes using equation A.12 
of Appendix A, the average cell inventory is used, resulting 


in different rates for each cell. 


D. MAXIMUM LIKELIHOOD ESTIMATION (MLE) 

The present work calculates an MLE for comparison 
purposes. As stated in Section B above, this rate is the 
total leavers (over time) divided by the total inventory 
(over time), using the four year estimation period. This 
empirical rate is called MLE because it would be the maximum 
likelihood estimate in the setting of independent Bernoulli 
trials. We retain this terminology on the transformed scale. 

It is well known that this MLE is the best unbiased 
estimator if the Bernoulli setting is tenable. The problems 
using the MLE here are threefold. First, the smaller the 
number of cell trials the greater the variability in the 
estimation. Thus while the estimate is unbiased, the range 
of values the estimate can easily assume is large. A stable 
estimate cannot be made. 

Second, this MLE assumes each data set is drawn from 
identical populations. Service retention is greatly affected 
by changing economic, political, and social forces. These 
forces are not constant, and over a period of three to six 
years the behavior of a cell can change radically. This 
introduces the yearly update problem. The requirement to 
drop old data as new data becomes available keeps the cell 
trials low, and the variability of the estimate high. In 
short, we are not yet equipped to build an estimation scheme 
based upon manpower flow model structural conditions. 

Third, manpower planning cannot focus on an individual 
OF/LOS/grade category. Since Marine Corps officers approxi- 
mate a hierarchical system, the requirement to promote an 
officer to fill a projected loss of, for example, an 
infantry Lieutenant Colonel will "ripple" down to a 


requirement to recruit an infantry Second Lieutenant. 
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E. TRANSFORMED SCALE CELL AVERAGE (TSCA) 

Tucker (Ref. l] and the present work calculate a TSCA 
rate. This rate can be viewed as a maximum likelihood esti- 
mator calculated: on the transformed scale, using transformed 
inventory and loss data. If the "normal with common vari- 
ance" model were firmly defensible on the transformed scale, 
the TSCA would provide the best linear unbiased estimators 
of the individual cell means. The method is accorded sepa- 
rate treatment because of the excellent results, especially 
in the near term (one year) validations. 

Stein [Ref. 3] in 1955 examined the performance of this 
maximum likelihood estimator in predicting cell values. He 
established that if the number of cells is at least three, 
then maximum likelihood estimation can be improved in an 


overall sense. The criterion he used was the global loss, 
L(§,a) = a R b cl S. la (2.2) 
I 


where § is the array of unobservable true cell values and a 
is the array of predicted cell values. This global loss is 


the sum across all cells if the individual loss is 


2 


(0. 2. G e ¿E 


where Ba and a. are the appropriate values for the ith cell 


in the aggregate. 


F. JAMES-STEIN ESTIMATION 

James and Stein [Ref. 4] developed an estimator, called 
the James-Stein estimator, which reduces the expected value 
of the global loss, when compared to the cell means. The 


expected value is called the risk R: 


R = E[L(0,2)] = YEL(0;-4;) 1, V i. (2.4) 
i 
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Two assumptions were made: 
(1) the cell values are normallv distributed, and 
(2) the within cells variance is constant. 

The basic idea of James-Stein estimation is the farther 
the cell mean is from the overall mean, the greater is the 
size of the residual error. Note that the cell mean is the 
TSCA, and the overall mean is the transform aggregate , or 
grand mean. All means are moved, or shrunk, toward the 
grand mean. The amount of shrinkage is proportional to the 
absolute distance from the grand mean, i.e., the greater the 
absolute distance, the greater is the shrinkage. 

There are, however, problems with this method. First, a 
natural objection is that some cell attrition rates mav be 
far from the grand mean simplv because the long term attri- 
tion from these cells differ greatly from the majority of 
cells in the same aggregate. To shrink the attrition rates 
of these cells toward the grand mean mav be erroneous. 

In dealing with a sample in the original scale, problems 
occur when there is no cell loss for the entire estimation 
period, i.e., when the MLE is zero. Tucker [Ref. 1] handled 
this by omitting such cells from all comparisons. In the 
present work these zero loss cells are retained in an effort 
to view the effect of the various schemes on the small cell 
rate estimations. 

Third, Appendix A demonstrates the Freeman- Tukey trans- 
formation does not normalize the cell means or stabilize the 
variance when the inventory or loss rates are low. Since 
normality with common variance is the baSic assumption of 
the James-Stein scheme, the reliability of the results must 
be questioned. 

See Appendix A for the James-Stein estimation algorithm 
as used by Tucker [Ref. 1]. 

See James and Stein [Ref. 4] and Tucker [Ref. 1] for 


details on James-Stein estimation. 
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G. ROBUST PARAMETRIC EMPIRICAL BAYES ESTIMATION 

An alternative method of analysis, unrelated to the 
schemes investigated here, is the robust parametric empir- 
ical Bayes (RPEB) model, suggested by D. P. Gaver. While 
relatively new, the method has shown promise in the settings 
to which it has been applied. 

Because of time constraints, this model was not imple- 
mented. The procedure iS a Significant departure from the 
present work, but it may offer important benefits to small 
cell estimation. Appendix F is a brief description of the 


technique. 
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III. LIMITED TRANSLATION JAMES-STEIN ESTIMATION 


A. GENERAL 

This chapter discusses the limited translation model, : 
and the validation approaches taken in this paper. 

As stated in Chapter I, it is intuitively unsettling to 
shrink all empirical cell rates toward the grand mean by the 
same affine translation. Also, one must guestion whether the 
risk could be further reduced from that of the James-Stein 
estimator. Two articles published by Efron and Morris 
[Refs. 5,6] present such a method: limited translation of 
the James-Stein estimator. 

To compromise between James-Stein and TSCA estimation, 
and to limit the translation of extreme values, an interval 
[-C,C] centered about the grand mean is established. Inside 
this interval all rates are translated using full 
James-Stein shrinkage. Outside this interval the amount of 
shrinkage is reduced the farther cell values get from the 
interval. The shrinkage approaches zero in the limit. 

To get an intuitive feel for the differences between 
James-Stein and limited translation James-Stein estimation, 
compare Figures 2.1 and 3.1. Figure 2.1 shows how the 
James-Stein technique shrinks all values toward the grand 
mean. Figure 3.1 shows how limited translation estimation 
reduces the shrinkage outside a certain range of values 
centered about the grand mean. 

Theoretically, limited translation estimation, by 
Shrinking some cells, will slightly increase the global risk 
over that of the James-Stein estimator. This increase is 
acceptable since the individual cell risk of extreme inven- 
tory cells is decreased. This means the estimators of the 
small cell attrition rates improve, usually significantly, 


at a small cost to the middle inventory cells. 
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B. THE SHRINKAGE FACTOR 
1. The Limited Translation Algorithm 
Conversion of the James-Stein estimation algorithm 
to limit the translation is straightforward. From Appendix 
A, after the attrition rates have been transformed using the 
Freeman-Tukey transformation, the James-Stein estimator Py 


is calculated, 


P. E OA | (3718 
where 
Cy = 1 - (K-3)SSE/[(K(T-1)+2)SSB] (3.2) 


is the shrinkage factor. 


To modify Cj fow limited translation, let 


p(u) = minimum (1,d/u%) (3.3) 
where 

u = (X,,-X) /(A+1). (3.4) 
Note that 

A = (K(T-1)+2)SSB/(K-3)SSE - 1 (3.5) 


and is the variance of the prior distribution of Q. The 
value of d is chosen from the interval [0,00]. 


The new shrinkage factor is 


Cry = 1 - p(u)[(K-3)SSE/ ((K(T-1)*2)SSB)] (3.6) 
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and the new estimator is 
PLJ = X + Ch5(X:;.-X). (257 ) 


If d/už is egual to or greater than 1, then full shrinkage 
occurs. However, if d/už is less than 1, then full 
shrinkage does not occur. 
2. The Shrinkage Interval 
The choice of a d value is important. The larger d 
is, the larger the interval [-C,C] becomes. If d becomes 


infinite, then the estimator Pj is simply the James-Stein 


estimator Pj;. This would result in no change in the global 
1 
or individual risks. However, if d/u? is less than one for 


some cell values, then those cells will not fully shrink, 
and there will be an improvement in the small cell indi- 
vidual risk. If d equals zero, then the interval [-C,C] 
shrinks to a point at the value of the grand mean, and all 
cell values shrink to the grand mean. In this case the vari- 
ability of the cell rates is zero Thus, we want some inter- 
mediate value of d. Appendix E discusses the theoretical 
implications of d, and methods of choosing values. 

The effects of d can be directly observed in the 
pattern of the limited translation shrinkage factors. Within 
[-C,C] , the cem shrinkage factors will equal” the 
James-Stein factor. The farther one gets from the full 
shrinkage interval, the smaller the shrinkage factors are. 
Note the shrinkage factor approaches zero in the limit. 

Caution must be exercised when viewing the shrinkage 
factor pattern in this light. The interval [-C,C] is an 
interval of cell values, in this case cell means. Therefore, 
if the cell value pattern is not apparent, the pattern of 


shrinkage factors will also not be apparent. 
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C. VALIDATION 

The development of the James-Stein scheme took place in 
a restrictive setting. It is useful to state that setting in 
a form most closely associated with our problem: 


X;; = pu; 6 i-1,..-,;K amd j*1,.. M ( 3:289) 


ij 13% 
with the (e; ;) as IID N(0,g?) random variables. Since this 
is the same setting as that of the one way analysis of vari- 


ance, it is convenient to use the quantities and notation of 


ANOVA. Specifically, let 


SSE = Y Y (Xi;-X,.)?, V i,j, (3.9) 
i J 
SSB = ny (X;.-X..)?, Vi, (3.80) 
i 
Xi. 7 (1/n)YX; ;, V i,j, and (3.119 
j 
Xe. = (1/k) YX. V i. (3.12) 
i : 


Since we will be using equations (3.2), (7.3), (7.4), 


and (7.7) of reference 5, we record the identifications 


G2 = SSE/(2+k(n-1)), (3.13) 


V = SSB/n, (3.14) 


26 


R = Re for Cl,...,k, GEIER 
X = Yeti: and (3.16) 
p(u) = minimum(1,d/uż). (5717) 


On the other hand our cell attrition data is fairly 
modeled with the binomial distribution (Ref. l) such that 
(1) Yjij = number of leavers in cell i during period j, 
(2) N; ; = central inventory, 
and the empirical cell attrition rates are 


Pi = (5 313)/ (9 N15); Vij: (3.18) 
d J i 


Because of the large number of small cells, the above empir- 
ical probabilities are unstable and it should be possible to 
improve the stability by shrinking them. 
Fre bo treat the p; s as the X:,.:. in the ANOVA 
setting, we encounter some flaws. Namely, the common vari- 
ance and normality assumptions are severly compromised, and 
there is no obvious way to estimate g?. The same flaws are 


present if we back off and use the cell empirical rates, 


in the role of Xij: 
In hopes of giving relief to this problem we (see refer- 
ence l) have chosen to use the Freeman-Tukey variance 


stabilization transform 


* sin 1(2(y; *1)/ (n;,*1)-1]) 
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in the role of the Xij- For the small cells we still have 
the flaws but not to as great an extent as before. See 
Appendix B. 

This leads us to the perform some validation computa- 
tions using the transformed scale, i.e., the sum of squared 
error risk function will be computed with the error input 
given by the difference between the z;, (for tea cross 
validation year) and the James-Stein shrunk value  z; 


i 
computed from (7.7) of reference 5, i.e., 


Zj E "eC i SIE s) ( 3.200 
and 
C = (K-3)SSE/(2+K(n-1))SSB. (3.22) 


These risks will serve to tell us whether the shrinkage 
technique is behaving as expected in spite of the rough 
treatment given the assumptions in the setting. 

It is also important to make validation computations in 
the original scale. Even though the risks in the transformed 
scale look attractive, the attrition rate estimates must be 
converted back to the original scale. For this purpose we 


propose the chi-square statistic 
2,010 MiePi)7/N; jpi (l-Pi): V i (3.23) 
Į 


where y;, and N;, are the leavers and central inventory 


counts for the ith cell in the tth validation period, and 


Pi = .S[1-sin(2,/(N;+.5)%], and (3.24) 
N; = (1/n)YN;;; V ij (3.28 
J 


28 


provided the argument of the sine function belongs to [-1/2, 
MWAN 6uusHde this range, we use p. = O or I according to 
whether the argument is below -n/2 or above m/2, respec- 
tively. Ihe above value, fè, is what we will call the 
James-Stein attrition rate generator. 

Above we have described two validation risk calcula- 
tions, one in the transformed scale and one in the original 
scale. Let us now address the question of "To what are these 
risks to be compared?". The general answer is to make like 
calculations for the other estimation schemes: original and 
transformed scale aggregates, TSCA, maximum likelihood, and 
the limited translation. To be specific, we must address 
some details. The first four will be taken up here and the 
limited translation modification will be discussed later. 

We will discuss the easiest cases first: aggregate and 
maximum likelihood in the original scale. Let i index all 


cells in the aggregate. We define an indicator variable 


l N;>0 6,28) 
D; = if 3 V i 
then 
k = L 3 V i. (3.27) 


k is thus the number of cells in the aggregate with non-zero 
inventorv. 


Now, if we define another indicator variable 


1 N,>0 and py p(i)#0 or 1 (3.28) 
D'i - LÉ sek: 1 
0 le or PMLE (i)=0 or Y 
then 
k ' = Ð 3 y 1. (329) 
i 
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k' is thus the number of cells in the aggregate with 


non-zero inventorv and the MLE not equaling zero or one. 


The aggregate attrition rate is 


p= 2 yij)/ ON j) M ode (3.30) 
|J ij 


and this value is inserted in place of p. in the “Chi-squ e 


statistic. For the maximum likelihood method we use instead 


PMLE(i) = (5313)/ Y N15); V 1,J. eg 
J J 
In both cases we must remove all terms in the sum for which 
Nij is zero. This has the effect of reducing k. Also, since 
we may have some pME(i) = 0 or 1, these terms are removed, 


leading to k' risk terms in the sum. It is recommended that 
such results be multiplied by k/k' in order to provide a 
fair comparison with methods that provide positive non-unity 
estimators for all cells. 

Turning to the risk calculation in the transformed 
scale, there are some alternative ways to decide what we 
call an aggregate or a TSCA estimator. The question arises 
because to each cell there is associated two numbers, Yi; 
and Ni jo. 

If performance in the transformed scale were the only 
concern, we would simply use Z,, for the transformed aggre- 
gate and z;, for the TSCA estimator. This in fact was done 
by Tucker (Ref. 1]. But because of the varying inventory, 


these values, Z.. and Z do not correspond to p and 


1° 
Pmie(i). For purposes of the present study it was decided 


to cast these two values into the transformed scale: 


za(i) = 0.5(N;*0.5)*(sin 1 [2 (N; B) / (N; *1)-1] (3:928 
+ sin l[2(N;p*1)/ (N;*1)-1]) 
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where N; is the average inventory for the itħ cell. This 
choice leads to the original scale aggregation in the trans- 
formed scale varving form cell to cell. It is fair to do 
this because the varving inventorv allowance is a part of 
Ble process that helps stabilize the value in the 
transformed scale. 


For the same reason, and writing p; for py, pli), we use 


zwpg(i) = 0.5(N;+0.5)%(sin 1[2(N;p;)/(N;+1)-1] (3.33) 
+ sin [2(N;p;+1)/(Nj+1)-1]) 


for the maximum likelihood estimator in the transformed 


scale. 
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IV. RESULTS 


A. GENERAL 
This chapter displavs various data results for the 


estimation methods compared in this paper. 


B. SHRINKAGE FACTORS 

The shrinkage factors for the James-Stein (JS) and the 
limited translation James-Stein (LTIS) methods are 
presented, by aggregate, in Table 1 thru Table 3. An aggre- 
gate iS a specific OF group and grade, e.g., the aviation 
First Lieutenants, or the combat Support Lieutenant 
Colonels. One James-Stein factor is calculated for an entire 
aggregate, while the limited translation technique assigns a 
factor to each cell based on the inventory and the chosen 
value of d. If the distributional assumptions are reason- 
able, a properly selected d will force the middle inventory 
cells to have equal James-Stein and limited translation 
factors, with the limited translation factors getting 
snaller the farther one is from this middle inventory range. 
As discussed in Appendix B, these assumptions were not met. 
Use of a graph of relative savings loss versus d, for 
different validation years, yielded inconclusive results, 
except for the aviation First Lieutenant aggregate. See 
Appendix E. Since a point of emphasis of this study is small 
cell estimation, d values were chosen to force the limited 
translation shrinkage factors to act in accordance with 
theoretical patterns, as discussed in Chapter III and refer- 
ence 5. The purpose was to evaluate the resulting small 
cell risk values. See Table 4. 

The shrinkage factors exhibit interesting behavior. 
Within the interval [-C,C] the limited translation shrinkage 


equals James-Stein. Once outside the interval, the limited 
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translation factor is reduced. For example, consider Table 
2. The code 7 (Engineers) Lieutenant Colonels have a 
James-Stein shrinkage of .1720. All limited translation 
factors are less than or equal to this. Those that are equal 
have cell means (on the transformed scale) within [-C,C]. 
Note that an isolated cell can fall within this interval, 
such as the factors for LOS s of 15, 17, 20, and 29. For 
those cell with means outside [-C,C], the respective limited 
translation shrinkage is less than .1/20. See Chapter III, 
Section B, Subsection 2. This behavior is non-monotone with 
respect to LOS, and upon reflection, should be anticipated. 
The pattern of officer losses is tied to contract expira- 
tions, reduced promotional expectations, and retirement 


options. 


C. FIGURES OF MERIT 

Table 5 thru Table 7 display the figures of merit for 
the six estimation schemes in transformed and original 
space. As expected, both aggregate estimation methods have 
uniformly higher risks than the other techniques. | 

Several points are worth noting. First, in the trans- 
formed scale, limited translation ranks first overall in 
lowest risk. This is contrary to the theory as developed by 
Efron and Morris [Refs. 5,6]. As noted in Chapter 3, 
limited translation should lower the individual cell risks 
of those cells outside the interval [-C,C], but at the cost 
of an increase in global risk. 

When the FOM is examined by grade, TSCA is always best 
for First Lieutenants, and James-Stein is always best for 
Lieutenant Colonels. This effect appears to be unchanging. 

In the original scale, rankings change over time. For 
1981, TSCA is best overall and for each grade. This is also 
true for 1982, but to a lesser degree. In 1983, limited 


translation is the best overall and for each grade. 
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TABLE l 
AVIATION SHRINKAGE FACTORS 


CODE 38 
LOS lst LT LTCOL 
JS „0128 0322 
LTJS 
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LIV L W LW LW LV LV LW LW LV LW LV LW LW LW BON FAN BO C9 O9 C9 O9 C9 C9 LONI KA LO LI LV 
OOO068CO806000000060000000000000000O 
HAN NO NO W W W W, NS NO e W NO [NO jr e 
A0 GILTINI NY PH ANVAN OUI NY SY PV ON GW KO KO W KO W W KO KO KO KO VV VVV 
aO EES GN NJ NO CG SN al al l al all al all l 


OND CONS CUT ELO HINO OO NJ ON UT ECO NO pa 


The excellent performance of the TSCA estimate is worth 
noting. TSCA may be thought of as the James-Stein estimator 
with zero shrinkage. Recognizing that shrinkage is an esti- 
mated parameter, we are surprised in those cases for which 
the estimated shrinkage is large, and TSCA outperforms 


James-Stein and limited translation. See Tables 3 and 6. 


D. SMALL CELL FIGURES OF MERIT 
The risk associated with the small cells was investi- 


gated to determine which technique best predicts small cell 
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TABLE 2 
COMBAT SUPPORT SHRINKAGE FACTORS 


CODE 20 
1LT LTCOL 


„1947 


CODE 13 
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„1947 
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attrition since the global figures of merit in Tables 4 thru 


6 may be disguising what is happening in the small cells. 


The small cell figures of merit tables record the small cell 


13 
Were 


Table 8 thru Table 


the global risk. 


contribution to 


inventory ranges 


Two average 


display these results. 


and six to ten. 


zero to five, 


examined: 


in the transformed scale, 


The TSCA method ranks 


limited translation first for 1982, 


For the range zero to five, 


first. fot 


are varied. 


the results 


PB, 


and James-Stein 
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TABLE 3 
GROUND COMBAT SHRINKAGE FACTORS 


CODE 5 CODE 10 


ILT LTCOL 


„07 96 


CODE 3 
ILT  LTCOL 
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JS 
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first Tor 


scale TSCA is 


In the original 


James-Stein first for 1982, 


first for 1983. 


first for T983. 
pattern has 


SSN, 


and limited translation 


The most surprising result is that no clear 


consistently poor 


than the 


other 


emerged, 


The 


performance of the two aggregate estimation schemes. 
limited translation technique has not resulted in a uniform 


in about one-half 


lowering of the small cell risk. In fact, 


of the 


James-Stein estimation resulted in a lower 


Cases 


small cell risk than did limited translation. 


36 


TABLE 4 
D VALUES USED IN STUDX AGGREGATES 


ML LT LTGOL 


Aviation . 8 „4 
Combat Support VÈ „4 
Ground Combat 59 EJ 





Table 14 thru Table 19 list the small cell percentages 
of total risk. These results are useful in determining the 
actual contribution of the small cells to the total risk for 


the specific aggregate, estimation scheme, and year. 


E. ATTRITION RATES 

Since the ultimate purpose of this study is to produce 
attrition rates, the rates generated by the six estimation 
methods are presented in Table 20 thru Table 33. The 
differing results of each method are apparent. Of interest 
is the relative agreement among MLE,TSCA, James-Stein, and 
limited translation, when compared to the two aggregate 
methods. 

Also, we are comforted that the attrition rate patterns 
follow both the pattern of the raw attrition, and experi- 
ence. Experience tells us that, for First Lieutenants, no 
attrition occurs within LOS's of zero or one because there 
are no First Lieutenants with such LOS's. A peak in attri- 
tion should occur around the eight year point, because most 
officers have been promoted to Captain, and those remaining 
resign to pursue civilian careers. Another peak should occur 
at twenty years since at this point First Lieutenants who 


have enlisted time start to retire. 
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Tucker (Ref. 1] demonstrated this loss pattern in a 
series of graphs (Figures A.l thru A.10 in reference 1) 
which display raw loss rates for selected grades and OF's. 
These graphs show an increase in attrition rate peaking at 
eight years for First lieutenants, and again at twenty 
years. 

The rates for TSCA, James-Stein, and limited translation 
follow this pattern. MLE does also, in general. However, MLE 
will predict zero attrition if the cells have been empty for 
the entire estimation period, as it does in Table 22. The 
predicted rates for LOS 13 thru 19 are zero, while the above 
three schemes estimate rates ranging from .0/ to .09. 

The aggregate methods do not display this pattern. In 
fact, the translated scale aggregate estimate is at a local 
minimum when the above schemes reach local maximums. The 
original scale aggregate , of course, remain constant except 


for those cells forced to zero by zero cell inventories. 
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TABLE 5 
AVIATION FIGURES OF MERIT 
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TABLE 6 
COMBAT SUPPORT FIGURES OF MERIT 


1982 1983 


TRANSFORMED FOM 


1981 
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TABLE 7 
GROUND COMBAT FIGURES OF MERIT 
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TABLE 8 
AVIATION SMALL CELL 


FOM 


(INV < 5) 


1982 1983 
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TABLE 9 


AVIATION SMALL 


CELL FOM 
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TABLE 10 
COMBAT SUPPORT SMALL CELL FOM 
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TABLE 11 
COMBAT SUPPORT SMALL CELL FOM 


(6 < INV < 10) 


1982 1983 
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1983 


1982 
TRANSFORMED FOM 


TABLE 12 
(INV £ 5) 
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TABEE 13 
GROUND COMBAT SMALL CELL FOM 


(6 s INV s 10) 
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TABLE 14 
AVIATION SMALL CELL PERCENTAGE OF TOTAL FOM 


(INV s 5) 
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15 
CELL PERCENTAGE OF TOTAL FOM 
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TABLE 16 
COMBAT SUPPORT SMALL PR OF TOTAL FOM 
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TABLE 1/7 
COMBAT SUPPORT SMALL CELL PERCENTAGE OF TOTAL FOM 
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TABLE 18 
GROUND COMBAT SMALL CELL PERCENTAGE OF TOTAL FOM 
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TABLE 19 
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AVIATION ATTRITION RATES FOR IST LTS 


JS LTJS 


AGG ORIG AGG TRANS MLE TSCA 


LOS 


MO 00F000 ALMA OO TIMO 
«QOO c cO AT OGM EN ON CO 

COCO rd ONT MODO -T DO GO Fr rd F C C OOO COD COD OOO OOO OO OOO CO 
OOOOOCOoOoOOoOOoOOoOOyy 


RS ANT 00 O AU A00MONN 
ODOVUONNANDANSFONIMNNNMAQAODODODODOOOOOOQOOOO 

ANAS WOOF COCO O 

OOOOOOOOOoOoyy 


Cr COO ONT TT rd C 

O r rd O ON COU O rd CLD 
CQOrncJeowroxo«rooco-roeJyoOoOOoOOOOOOOOOOOO 

OOOOOoOOoooooooe 


ONSJSJSOOJTMAMIM 
MWVOMmMOooa4sNwa 

C3 C3 rd C C CONT O aT LO CO F C3 COO CO O CO COO OD O OD ODO C CD C 
OOOOOOOOOOO 


DCO LOUSY IN IN ON mun 

DON OM DOMMTNOOODOO0O0O000000000000O 
ONTAMOOOON 
rd r-d COO COO CO 


LUN NL UN UA LAS UA UA LA UA UA UA 

OO CO CO CO CO OO CO OO CO CO CO CO CO 
OOw«r--xrururNxrxrururd-dr-drooooooooooooooocoo 

0000000000000 


DANOS NADO NONO TN DONONO 
OMAN ON DO AAA AA AAA ANNNNNNNNNNM 


54 


TABLE 21 
AVIATION ATTRITION RATES FOR LTCOLS 
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TABLE 22 
COMBAT SUPPORT ATTRITION RATES FOR IST LTS 
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COMBAT SUPPORT ATTRITION RATES FOR 1ST LTS 
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COMBAT SUPPORT ATTRITION RATES FOR IST LTS 
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TABLE 25 
COMBAT SUPPORT ATTRITION RATES FOR LTCOLS 
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COMBAT SUPPORT ATTRITION 
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TABLE 27 
COMBAT SUPPORT ATTRITION RATES FOR LTCOLS 
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TABLE 28 
GROUND COMBAT ATTRITION RATES FOR IST LTS 
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GROUND COMBAT ATTRITION RATES FOR IST LTS 
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TABLE 30 
GROUND COMBAT ATTRITION RATES FOR IST LTS 
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TABLE 31 
GROUND COMBAT ATTRITION RATES FOR LTCOLS 
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TABLE 32 
GROUND COMBAT ATTRITION RATES FOR LTCOLS 
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TABLE 33 
GROUND COMBAT ATTRITION RATES FOR LTCOLS 
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V. CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 

This study investigated the performance of various 
attrition rate estimation schemes, with particular emphasis 
on small cell performance. The hope was to identify a 
specific method, the limited translation James-Stein tech- 
nique, as the best predictor of cell rates. This has not 
been done. In fact, the relative performance of four schemes 
(MLE, TSCA, James-Stein, limited translation James-Stein) 
was so varied that no clear winner or pattern of performance 
emerged. 

several points are of interest. First, even though the. 
performance of the four techniques listed above varied, they 
were all uniformly better than either original or trans- 
formed scale aggregate estimation. Thus four excellent 
candidates for replacing the present attrition rate 
estimation methods are available for further testing. 

Second, the models seem very sensitive to small changes 
in parameters. When investigating the behavior of the 
Freeman-Tukey transform, and James-Stein and limited trans- 
lation James-Stein estimation, several choices had to be 
made concerning the methods used to calculate various quan- 
tities, e.g., the grand mean, maximum likelihood estimators, 
and the inventories. Large changes in FOM values and cell 
attrition rates were observed as different methods were 
tried. This lack of robustness is troublesome, and indicates 
the models are not ready for implementation. We feel a 
better method of aggregating the OF/LOS/grade cells would do 
much to relieve this problem. 

Third, in the small cells, the Freeman-Tukey transforma- 
tion fails to normalize the cell means or stabilize the 


variance. This failure is an inherent aspect of dealing with 
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small cells. In general, the research effort must be alert 
to finding alternative ways to manage the small cell 


problem. 


B. RECOMMENDATIONS 

At present, no method investigated here is recommended 
for implementation. Further studv in the following areas is 
needed. 


l. Aggregation. The work of Amin Elseramegv (Ref. 2] 
needs to be carried forward to identifv statisticallv 
well behaved aggregates. 


2. TSCA. The performance of the TSCA scheme should be 
investigated in light of its surprising performance. 


3. Yearly Update. The Ponte of when and how to update 
the estimation data base must be solved. 


4. Use of Different Estimators. Investigate the use of 
several different estimators (such as MLE, TSCA, 
James-Stein, and limited translation James-Stein), 
either in combination to yield average cell estimates, 
or separately to estimate the rates of different cells 
based on OF, military occupation specialty (MOS) as a 
subset of OF, LOS, grade, or desired estimation vear. 
This would require the aggregation problem solved. 


5. Robust Parametric Empirical Bayes. Investigate the use 


of the robust parametric empirical Bayes model to 
determine global and small cell estimation efficacy. 
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APPENDIX À 
ESTIMATION ALGORITHMS 


1. JAMES- -STEIN ALGORITHM 

As used by Major Tucker [Ref. 1], this algorithm calcu- 
lates the James-Stein estimator of attrition rates. The 
following formulas use a double indexing system, (i,j), to 
identify cells in an aggregate and use t = 1l,...,n to iden- 
tify time periods. This usage is different from that of the 
main text which used the single subscript i = 1,...,k to 
identify cells. This was convenient for purposes of ANOVA. 
The current double indexing is needed to access the real 


data in its natural form. 


Notation: 
I = number of LOS cells in the chosen aggregate 
J = number of OF cells in the chosen aggregate 


INVi ; (t) = inventory with LOS=i and OF=j at 


beginning of year t, t=1,...,T 
yij(t) = number of attritions in cell (i,j) during 
year t 
ni ; (t) - maximum (y45 (t), . SD[INV;  (E)+INV, 4 (C 10MM 
D = incidence matrix which identifies cells with 
sampling zeros (average inventory zero for all 
estimation years) 
D; ; = 0 if cell is a structural zero 
ij ^ l if cell is not a structural zero 
STEP 1: Use a variance stabilizing transformation 


(Freeman-Tukey ). 


x;;(t) = 0.5[n;;(t)+0.5]#(sin”H[2(y;;(t)/(n;;(t)+1)-1] 
E sin *I2(y; ;(t)*1)/(n;;(t)*1)-1}). 
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(A.1) 


STEP 2: Form the cell means and the grand mean. 


Xi; = (1/T)YX;;(t), Mi it, (A.2) 
t 
EN | 
K = 22 D; 3: V 154: (A.4) 
ij 


STEP 3: Form the sum of squares error (SSE), and the sum 
of squares between (SSB), using the total sum of squares 
(SST). 


SST = ON RE Y i,j,t (A.5) 
ijt | 

ES D EE 170145 V SE (A.6) 
ijt 

SSB = SST-SSE (A.7) 


STEP 4: Compute the set of James-Stein estimators in the 


transformed scale. 


C; = 1 - (K-3)SSE/(K(T-1)+2)SSB (A.8) 
P j(i,J) E if l 
undefined D; ; =0 
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STEP 5: Invert the transform to produce the attrition 


rates Tij. This corrects a typographical error that appeared 


in this step in reference l. 


t 
va Fs ee (A.11) 
0 Vij$-n/2 (A. 125) 
T = ES [IRS SA Pa dá LÉ -n/2<v; ,<n/2” 
l Vij2n/2 


2. LIMITED TRANSLATION JAMES-STEIN AEGSGETTANM 

This algorithm calculates the limited translation 
James-Stein estimator of attrition rates. Because the only 
difference between this algorithm and the James-Stein algo- 
rithm of Appendix A, Section 1 is in STEP 4, only that 


changed step is presented. All other steps remain the same. 


STEP 4: Compute the set of limited translation 
James-Stein estimators in the transformed scale. Choose d 
from the range [0,00]. Values between .2 and 1.0 seem to 


give the best results. 


C; = 1 - (K-3)SSE/(K(T-1)+2)SSB (A.13) 


A = (K(T=1)+2)8SB KOCE SI (A.14) 
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u = (Xj j-K)2/(A+1) 


p(u) = minimum (1,d/u*) 


Cr = 1 - p(u)[(K-3)SSE/(K(T-1)+2)SSB] 


undefined D; :=0 
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(A.15) 


(A.16) 


(A.17) 


(A.18) 


APPENDIX B 
UTILIZATION OF THE FREEMAN-TUKEV ARCSINE TRANSFORMATION 


l. GENERAL 

Limited translation James-Stein estimation as discussed 
in the Efron and Morris articles (Ref. 5,6] makes the 
following assumptions: 


(1) The distribution of the number of losses is normally 
distributed. 


(2) The variances of the normally distributed number of 
losses are equal. 


Since the raw losses are not normally distributed, the 


Freeman-Tukey arcsine transformation 


x = 0.5[n+0.5]%(sin"*[2y/(n+1)-1] (Bið) 
+ sin 1[2(y*1)/(n*1)-1]) - 


was used, given the central inventory n, to transform the 


raw losses y into transformed losses x. 


2. NORMALITY OF TRANSFORMED LOSSES 

Tucker [Ref. 1] demonstrated that the distribution of y 
given n is Binomial (p), where the parameter p is the prob- 
ability of an individual loss. The probability mass function 


of «this discrete distribution is 
P(Y=y) = [n!/(y!(n-y)!)] p? (1-p)" “. (B.2) 
Since there is a unique mapping from Y to X, it is true that 


P{X=x} = P{Y=y} (B.3) 
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if 
m= f(y,n). (B.4) 


To check normality, a central inventory n was chosen. 
Possible losses y, from O to n, were identified. Then the 
above probability mass function and transformed losses x 
were calculated. This data was then displayed on an x versus 
P{X=x} graph. Figures B.l, B.2, and B.3 are graphs for n 
equals 5, 8, and 10. Probabilities of loss are for p equals 
„05, .1, .15, and .2. The ranges of greatest concern were 
those of low central inventory n and low probability of loss 
pa 

By inspection, the higher the values of n and p, the 
better is the normal approximation. The Freeman-Tukey 


transformation is unreliable at low values of n and p. 


3. VARIANCE STABILITY OF TRANSFORMED LOSSES 
The mean and variance of the Binomial distribution is 
well known. Again using the fact that if x is a function of 


y and n, then 
P{X=x} = P{Y=y}, (B.5) 


and the variance of x can be calculated, given values of n, 
y and x. The values of the variance of x were calculated, 
and are graphed in Figure B.4. 

Bv inspection, once n equals /, the variance stabilitv 
at values of p less than .2 is poor. Again, the 
Freeman-Tukev transformation is unreliable at low values of 


n and p. 
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Figure B.2 Distribution of the Central Inventory N 
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Figure B.3 Distribution of the Central inventory N = 10 
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Figure B.4 Variance of the Central Inventory 
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APPENDIX C 
LIMITED TRANSLATION FACTOR DERIVATION 


This is a derivation of the form of the limited transla- 


tion factor 
p(u) = minimum (1, d/u?) (Ch 


Given: From Efron and Morris [Ref. 5], 


x+M x<-C (C.2) 
ða ML > Ax/ (A* 1) 1 Ë xel- C.C] 
X M i => 


where, if X j j and X are those used in Appendix A, 


GMA (C.3) 
d = M(A*1)* (C.4) 
u= (X;;-K) /(A+1) (CR 
õa M(x) = [I-p(u)/(A+L)]x. (C.6) 
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Derivation: From equation C.2 we have, 


Adel x<-C (6-7) 
dA MIX) = Ax / (A+1) if xe[-C,C] 
X M x>C 


By substitution, and multiplication by x/x, 


(x+M)x/x [u(A+1)]Ž<-M(A+1) (C.8) 
' (1-p(u)/(A*1))x = Ax/(A+1) if [u(A+1)]že[-M(A+1),M(A+1)] 
(x-M)x/x [u(A+1)]>M(A+1) 


By multiplication by l/x, and cancellation, 


(x+M)x/x ué<-M(A+1)% (0.9) 
l-p(u)/(A+1) = Ax/(A+1) if u%e[-M(A+1)%,M(A+1)%] 
(x-M)x/x ué>M(A+1)% 


By subtracting 1l, and multiplying by -l1, 


1- (x*M)/x už<-M(A+1)Ž (C.10) 
p(u)/(A*1) = 1-A/(A+1) if uže[-M(A+1)ž,M(A+1)3] 
l-(x-M)/x už>M(A+1) 


By multiplying by (A+1), and substitution, 


(A+1)[1-(x+M)/x] už<-d (CAN 
p(u) = (A+1)[1-A/(A+1)] if  uže[-d,d] 
(A+1)[1-(x-M)/x] už>d 
-M( (A*1)/u)* už<-d (C.12) 
SO Pe euer T. dd] 
M((A+1)/u)% už>d 
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d'u e (C.13) 
> 1 it ue EE 
ANÉ uŽ>d 


If už<-d, then O<-d/už<1, and if už>d, then O<d/už<l. 


Therefore, the last eguation may be written as 

p(u) = minimum (1,d/u%). | (C.14) 

It is worth noting that if xe[-C,C] and p(u)=1, then 

ða M(X) = Ax/(Atl). (C. TED 
Also í x¢[-C,C] and O<p(u)<l, then 

Ax/ (Atl) < 84 MX) < XI (c. Top 
Thus, if x is outside the interval [-C,C], then this x is 
translated less than any x's inside the interval [-C,C]. 


This confirms the graphical representation of limited 


translation shown in Figure 3.1. 
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APPENDIX D 
DATA MANIPULATION 


1. GENERAL 

This appendix documents the steps and programs used to 
take the raw data from magnetic tape to the output of 
figures of merit and attrition rates. The program languages 
include JCL,WATFIV, and APL. The author is familiar with the 
svstem at the Naval Postgraduate School, and the instruc- 
tions are therefore specific for that svstem. However, minor 
changes should be all that are necessarv to implement these 
procedures elsewhere. To achieve good results, the 


procedures should be followed in the order presented. 


2. CONVERSION OF RAW DATA FROM TAPE TO APL 

The original data is on a magnetic tape named COUNTS, 
prepared by NPRDC. The tape is held by either Professor 
R.R. Read, or by the Computer Center in Professor Read's 
name. Ensure the tape is properly logged into the Computer 
Center, and submit the JCL program IEBGENER in Figure D.l to 
put the tape on mass storage. The data set in mass storage 
should be named MSS.SXXXX.COUNTS, where XXXX is the user ID 
number of the operator. 

Once in mass storage, submit the JCL program MSSCOUN in 
Figure D.2 to move the data from mass storage to the MVS004 
disk. Data on this disk is accessible from CMS. 

Use the system exec GETMVS to move the data from the 
MVS004 disk to your disk. Simply enter 'GETMVS' on the 
computer and follow the directions. The identification 
requested by the prompts of GETMVS will be 'SXXXX COUNTS'. 
Since the data set is large (16,093 lines of 53 columns 
each), it is advisable to get additional workspace by either 
applying for a B disk of at least 8 cylinders, or by getting 
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EXEC PGM-IEBGENER 

SYSPRINT DD SYSOUT=A 

SYSIN DD DUMMY 

SYSUT1 DD UNIT=3400-5,VOL=SER=COUNTS, DISP= (OLD, PASS 
DCB= ( (RECFM- FB RR 53, BLKSIZE= 21200,DEN=4), 

SYSUT2 DD UNIT= EE MSVGP=PUB4B „DISP= NEW car) 
DCB= (RECFM=FB , LRECL=53 , BLKSIZE=12985 ) 
DSN=MSS.S1662.COUNTS 


/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 





Figure D.l JCL Program 


UNTS JOB (1662.9999),'MAJ J.R.ROBINSON' ,CLASS=B 
BGENER 
SPRINT DD SYSOUT=A 
SIN DD DUMMY 
SUT1 DD DSN=MSS.S1 
SUT2 DD DSN- S1662. 
SPACE=(CYL, (2 


COUNTS , DISP=SH 
NTS , VOL=SER= MVSOOA UNIT- JSSOM 
RLSE ) , DISP=(NEW,KEEP ) 


662. 
COU 
2), 


> 





Figure D.2 JCL Program MSSCOUN 


a temporary C disk of at least 8 cylinders. Be aware that a 
C disk disappears once logged off, and all data on it is 
erased. 

The WATFIV program SORT in Figure D.3 should now be used 
to separate the data in COUNTS into seven files, one file 
for each fiscal year. The data files can be conveniently 
named COUNXX DATA, where XX is the year, e.g., 77. 

Using the WATFIV program INV in Figure D.4 , create an 
array of inventory indices for each fiscal year. The program 
should read in the data sets prepared in step 4 above, e.g., 
COUN77 DATA. The output is read into a file that can be 
named INVXX DATA, where XX is the fiscal year, e.g., 7/. 
Note that, for each fiscal year, a different DO loop is 


used, since the number of data rows vary form year-to-year. 
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THIS PROGRAM ASSIGNS THE DATA FROM COUNTS 
INTO SEPARATE FILES FOR EACH FISCAL YEAR. 


"INTEGER VEAR,OF,GRADE,LOS,INV,LI,L2,L3,LA, 
*L5.L6.L7,L8 

DO 100 Iż1,16093 

READ (15,20, END=200) YEAR,OF, GRADE, LOS, INV, 
LI, L2.L3,L4,L5,L6,L7,L8 

FORMAT (41 


o 7 
INV,L1.L2 wa? L5 
TE (YEAR EQ. 78) WRITE 
SHOSNINVOLI L2 L3 E4.L5 
IF (YEAR.EQ.79) WRITE YEAR , OF ,GRADE 
* INV,LÍ,L2;,L3,L4 l 
ir (YEAR. E .80) WRIT YEAR,OF,GRADE 


3 L 
„ÍF (YEAR.EQ.81 IM vil YEAR, OF , GRADE 


YEAR , OF , GRADE 
` YEAR, OF, GRADE 
L8 





Figure D.3 WATFIV Program SORT 


Ensure the read and write files are properly defined for 
each year. 

Use the WATFIV program LOSS in Figure D.5 to create an 
array of loss indices for each fiscal year, similar to the 
inventory indices in step 5 above. Note that the losses are 
ageregated, in that the loss data on the COUNXX DATA file is 
broken into 8 different loss categories (see Reference 1). 
Such a breakdown is not used in the present work. See Tucker 
(Ref. 1] for programs and procedures if loss type data is 
desired. Again, the data can be read into files named LOSSXX 
DATA, where XX is the fiscal year. The arrays LOSSXX will be 
Significantly smaller than the INVXX arrays. 

Finally, use the APL system exec CMSIO to move the data 
files INVXX and LOSSXX into an APL workspace. The APL arrays 
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n 
Cy 
O 


INTEGER GRADE,OF,XEAR,LOS,INDEX,IN,TOT,IXR 


THE DO LOOPS CORRESPOND TO THE NUMBER OF 
RECORDS FOR YEARS 1977 THRU 1983. 


000000 0000 
I 
O 
COO COO OOO C 
Ra CO 


"= 


I 
O 
Ho e jo i a 
© OFFA OHHHHH 


ON YEAR , OF , GRADE , LOS , INV 
jm 7 ,L8 | 


IYR 
ALL SUM (IN,TOT,OF,GRADE,LOS 


Or fur M oH uu ue tl 


END 


THIS SUBROUTINE CREATES THE INDEX ARRAY 
FOR AN INVENTORY DATA ELEMENT. 


SUBROUTINE SUM (INDEX,I,J,K,L,NUM) 
INTEGER INDEX,1I,J,K,L,NUM | 
INDEX = I+J*1000000+K*100000+L*1000+NUM 
WRITE (11,300) INDEX 

300 FORMAT (19) 
RETURN 


09000 


SENTRY 


Figure D.4 WATFIV Program INV 


Should be character, vice numeric, arrays, and CMSIO allows 
this choice. The APL functions INVMATX and MATRIX discussed 
below assume the APL character arrays are words of ten char- 
acters, the first nine characters being the data index, and 
the tenth character a blank. 


3. CREATING THE INVENTORY AND LOSS ARRAYS 

Using the INVXX arrays created above, and the APL func- 
tions GETINV in Figure D.6 and INVMATX in Figure D.7, create 
the arrays IXX. Note that GETINV calls INVMATX, and INVMATX 
uses the INVXX arrays. APL workspace size limitations may be 


a problem due to the large amount of data, and it may be 
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> 
G 
O 
ø 


INTEGER GRADE,OF,YEAR,LOS,IYR,LOSS 


THE DO LOOPS CORRESPOND TO THE 


Ġ 
C 
C 
C 
DO 100 I= 
C DO 100 I= 
C DO 100 I= 
G DO 100 I= 
C DO 100 I= 
C DO 100 I= 
C DO 100 I= 
PKP, LIR 
20 FORMAT (4 
IYR = YEA 
LOSS = 10 
LE“ (LET 
* LI 
IF. mem: 
* uo 
TIENES CT. 
L3 
TRAUMA GT. 
* 14 
TEES GT: 
* L5 
“LE. GTA 
* L6 
TE Z . GT 
* L7 
"TEES. GT. 
* L8 
100 CONTI 
200 STOP 
END 
G 
C 
C 
C 


RA 


T aps em. 
Ur DOSIS 


L6 
) 
00% 
CALL 
CALL 
CALL 
CALL 
CALL 
CALL 
CALL 


3 


ALL SÚM 


SUM 
SUM 
SUM 
SUM 
SUM 
SUM 
SUM 


NUMBER OF 


RECORDS FOR YEARS 1977 THRU 1983. 


209) YE YEAR ,O0F , GRADE, LOS , INV, 


(IN,LOSS,OF, GRADE, LOS 
(IN,LOSS,OF, GRADE, LOS 
(IN, LOSS, OF, GRADE, LOS 
(IN,LOSS,OF, GRADE, LOS 
(IN, LOSS,OF, GRADE, LOS 
(IN, LOSS,OF, GRADE, LOS 
(IN, LOSS,OF, GRADE, LOS 
(IN, LOSS,OF, GRADE, LOS 


THIS SUBROUTINE CREATES THE INDEX ARRAY 
FO LOSS DATA ELEMENT. 


SUBROUTINE SUM (INDEX,I,J,K,L,NUM) 
I,J,K,L, NUM 
1000000+K*100000+L*1000+NUM 
300) INDEX 


INTEGER INDEX 
IR Z 


INDEX = 
WRITE (11 
300 FORMAT (a 
RETURN 


Figure D.5 


necessarv to create one or two arravs at a time, 


copy them to another workspace. 


WATFIV Program LOSS 


and then 


The LXX arravs are created in a manner similar to the 


above, 


using the APL functions GETLOSS in Figure D.8, 


and 


MATRIX in Figure D.9. MATRIX uses the loss arravs LOSSXX. 
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V GETINV 

a THIS FUNCTION CALLS THE FUNCTION INVMATX 
a FOR EACH FISCAL YEAR. IXX IS THE INVENTORY 
a ARRAY FOR FISCAL YEAR XX BY OF/LOS/GRADE. 
I77<INVMATX INV77 

I78<INVMATX INV78 

I79<INVMATX INV79 

ISO-INVMATX INV80 

I81<INVMATX INV81 

I82<INVMATX INV82 

I83<INVMATX INV83 

' SHAPE OF I77 IS ! 

SEE 


[ 
[ 
[ 
[ 
[ 
Ë 
L 
L 
Ë 
L 
L 
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Figure D.6 APL Function GETINV 


V ZSINVMATX AS43583 Ge eb hdl se 





[1] a CREATES THE INVENTORY ARRAYS FOR THE FISCAL 
[2] a YEARS USING THE ARRAYS OF INDEXES INVXX. 
[3] a INVXX MUST BE A CHARACTER VECTOR OF 9 DATA 
[4] a ENTRIES FOLLOWED BY 1 BLANK FOR EACH LOOP. 
[9] Ze(40 31 10)p0 
[6] IepX 
C7] Je(I+1)+10 
[8] LOOP:»(J=O)/OUT 
[9] a Ace(14X) 
[10] B+1+(2(24X<(1+X 
[11] Cel+(2(14X«(2vuX 
[12] Deilt(2e(24X«(C14X 
[13] Exro(34X<(2yX)) 
[14] ZEB: D GUESS 
[15] Xe(4yX) 
[16]  JeJj=t 
[12] LOOR 
[18] OUT:'FINISHED -- SHAPE OF MATRIA YS | 
[19] BS 
Figure D.?/ APL Function INVMATX 
To create the aggregate (e.g., aviation, combat support, 


ground combat) inventory arrays, use the APL functions GETAV 
in Figure D.10, GETCS in Figure D.11, GETGC in Figure D.12, 
and GETOF in Figure D.13. GETOF is called by the other three 


functions. Each calling function creates on aggregate array, 
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V CETLOSS 

a THIS FUNCTION CALLS MATRIX FOR EACH FISCAL 
a YEAR. LAX IS e, en ARRAY FOR FISCAL YEAR 
a XX BY OF/LOS/GRAD 

L77<MATRIX LOSS77 

L78<MATRIX LOSS78 

L79-MATRIX LOSST79 

L80<MATRIX LOSS80 

n L81-MATRIX LOSSS81 

A L82-MATRIX LOSS82 

D L LOSS83 


L 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
L 
L 


KA (D GO UN E O K KA 
OLIL JIL JL JL JL JU IL JL J 


] 





Figure D.8 APL Function GETLOSS 


Waal KAS B;C:DsSsEsPF3I sd] 
THIS FUNCTION CREATES THE LOSS ARRAY FOR 
THE FISCAL YEARS USING THE ARRAY OF LOSS 
XX. IT IS CALLED BY CETLOSS. 
CHARACTER VECTOR WITH 9 
more ENTRIES FOLLOWED BÍ 1 BLANK FOR EACH 


00 
Z<(40 31 10)p0 
I<oxX 


DDDDDD 


1] 
2] 
SR 
4] 
51 
6] 
7) 
8] 
9] 
10 
11 
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13 
14 
T5 
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20 
21 
22 
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ET. 
[11] 
El | 
[13] 
[14] 
[15] 
[16] 
[17] 
[18] 
FR J 
[201] 
[ 21] 
[22] 


LOOP 
OE R sc SHAPE OF MATHIX IS 





Figure D.9 APL Function MATRIX 


e.g., GETGC creates the ground combat inventory array named 
GC. Again, workspace size limitations may be a problem, and 
the functions can be altered to create part of an array at a 
time. This permits the operator to reduce the number of IXX 


arrays present in the workspace. Note that the arrays are a 
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'central inventory' array, taking the maximum of the yearly 


loss and the average of the year's inventory. 


V GETAV 

a TALS CREATES THE INVENTORY MATRIX FOR 
a THE AVIATION GROUP. CALLS GETOF. 

a OCCUPATION GROUP. CALLS GETOF. 

a USE LINES 6-7 FOR ESTIMATION INVENTORY 
a MATRIX. 

AV<(4 31 10)p0 


1m 
WN KA 
I ILJ 


AV-GETOF 38 
a USE LINES 9-11 FOR VALIDATION O Cris 
a MATRIX. 

1 Vave(u 31 10)p0 

- DAY DOCS 3 8 


IA 
HBRnFVJDOyvJOOFEF 


AOL ! LL JL 





Figure D.10 APL Function GETAV 


V GETCS 
a THIS CREATES THE INVENTORY MATRIX FOR THE 
a COMBAT SUPPORT GROUP. CALLS GETOF. 
a USE LINES 5-12 FOR ESTIMATION INVEN- 
a TORY e 

CS€(3 401091 10)00 
A SE CREATES HE OF ENGINEERS 

- CETOF 7 


E 
a THIS ĊREATES THE OF OPERATIONAL COMMUN- 
A e ri ala 
CS-2333--GETOF 13 
a THIS CREATES THE OF MOTOR TRANSPORT 
CS-3335-- GETOF 20 
a USE LINES 15-22 FOR VALIDATION INVEN- 
a TORY ce 
VCS-(3 U 31 90 
A AMD CREATES 7 7 OF ENGINEERS 
VCS-1:::--GETOF 
A HIS CREATES THÉ OF OPERATIONAL COMMUN- 
A TE. 


S --GETOF 13 
a THIS CREATES THE SEHR MOTOR TKANSPORT 
Veð: ne eee 


nr ra rear Era ee 
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Figure D.11 APL Function GETCS 


The aggregate, loss arrays (e.g... AVL,.CSL,. GCL, 28 


are now created using the APL functions GCLOSS in Figure 
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V GETGC 
THIS CREATES THE INVENTORI MATRIX FOR THE 
GROUND COMBAT OCCUPATION GROUP. CALLS 


GETOF. 

USE LINES 6-12 FOR ESTIMATION INVEN- 
TORY MATRIX 
GC<(3 4 0 )p 

THIS CREATES? THE OF INFANTRY 
GC-13:33--GETOF 3 

THIS CREATES THE OF ARTILLERX 


C-2: - GETOF 5 
a rats’ CREATES THE OF TANKS AND AMPHIB 
GC-33::--GETOF 10 

A USE LINES 15-21 FOR VALIDATION INVEN- 
a TORY MATRICES 

VGC<(3 3 0) 

A THIS CREATES HE OF INFANTRY 
VGC-1:::--GETOF 3 

a THIS CREATES THE OF ARTILLERY 
VGC-2333--GETOF 5 

a THIS CREATES THE OF TANKS AND AMPHIB 
YGC-35;3--GETOF 10 
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Figure D.12 APL Function GETGC 
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Figure D.13 APL Function GETOF 


D.14, AVLOSS in Figure D.15, and CSLOSS in Figure D.16. 
These functions call the arrays LXX directly. Again, 
workspace size may require creative changes to the 


functions. 
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APL Function AVLOSS 


Figure D.14 
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Figure D.15 
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Figure D.16 APL Function GCLOSS 


Use the APL functions GETNA in Figure D.1/, GEINC in 
Figure D.18, and GETNG in Figure D.19 to create the desired 
grade specific inventory arrays, e.g., NA5, which is the 
central inventory array for aviation First Lieutenants 
(grade code 5). See Table 17 for grade codes. These func- 
tions call the aggregate inventory arrays, i.e., AV, CS, and 
GC. 

The APL functions GETYA in Figure D.20, GETYC in Figure 
D.21, and GETYG in Figure D.22 can now be used to create the 
desired grade specific loss arrays, e.g., YA5, which is the 
loss array for aviation First Lieutenants. These functions 
use the aggregate loss arravs, i.e., AVL, CSL, and GCL. 

Use the APL functions GETANA in Figure D.23, GETANC in 
Figure D.24, and GETANG in Figure D.25 to create the arravs 
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of grade specific average inventory. For example, ANA5 is 
the average inventory array for aviation First Lieutenants. 
The above three functions call GETAAV in Figure D.26, GETACS 
in Figure D.27, and GETAGC in Figure D.28, respectively. 
Also, this second group of three functions all call GETAOF 
in Figure D.29, which in turn uses the arrays IXX. 

The procedures have now produced central inventory and 
loss arrays for the estimation and validation years, and the 
average inventory arrays for the estimation years. All of 
these arrays are needed to calculate attrition rates and 
figures of merit. Note there is no requirement to create 


arrays of the average losses. 
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Figure D.18 APL Function GETNC 


4. ATTRITION RATE AND FOM CALCULATIONS 


The remainder of the procedures assumes certain global 


APL variables are defined in the workspace: 


m» 


(2) 
(3) 


C) 
(5) 


(6) 


N = the estimation period central inventory array, 
e.g., NAS, 
Y 


= the estimation period loss array, e.g., YA5, 


VN = the validation period central inventory 
array,e.g., VNAS, 


VY = the validation period loss array, e.g., VYA5, 


AN = the estimation period average inventory array, 
e.g., ANA5, and 
G - the James-Stein forced shrinkage rate. 


All functions are invoked by entering the function name. 


Since the above global variables and the global output vari- 


ables from functions discussed below are used to calculate 


attrition rates and FOM's, care must be taken to ensure 


variable values are not changed inadvertently. 
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Figure. Dp, APL Function GETNG 


Use the APL function ESTIM in Figure D.30 to calculate 
the array R of transformed scale attrition rate estimates. 
This function calls BINPREP in Figure D.31, SUMSQ in Figure 
D.32, and MLE in Figure D.33. The first page of R is the 
array of original scale aggregate rates, each entry being 
equal. The second page is the array of transformed scale 
aggregate rates. The third page is the cell MLE rates. The 
fourth page is the TSCA estimates. The fifth page is the 
James-Stein estimates, and the sixth and subsequent pages 
(if any) are the limited translation James-Stein rate 
estimates for different values of d. 

The user now has several options. First, the relative 
Savings loss for different values of d can be calculated 
using the APL function RELS in Figure D.34. Enter ESTIM to 
select the d values, ensuring the shape of R conforms by 


adding enough limited translation pages to handle all 
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Figure D.20 APL Function GETYA 


selected values of d. Next, enter RELS and ensure the shape 
of the variable RSL (which is the outputed relative savings 
loss) conforms to the number of d values. Then run RELS. A 
graph of d versus RSL allows a choice of the best d. 

The second option, if a value of d is available, is to 
calculate the risks of the various estimation methods (in 
the transformed scale) represented in the R arrav. Use ESTIM 
and the APL function RISKT in Figure D.35 for these calcula- 
tions. ESTIM calculates the R array using a single d value, 
and RISKT uses this R to calculate risks. The function can 
only handle an R array that has one page of limited 
translation rates. Output is by validation year. 

Calculation of the risk in the original scale is the 
third option. Use the function RISKO in Figure D.36 just as 
RISKT was used. This function calls BINCONV (Figure D.37). 
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Both RISKT and RISKO require an array SS to be defined 
globallv. SS allows selection of the specific cells to be 
included in the risk outputed by the two functions. If the 
global risk is desired, SS should be an array of all ones. 
If certain cells are to be investigated, e.g., cells with 
inventory less than six, set up SS so that each cell with an 
inventory less than six has an entry of 1 in SS, and all 
other cells have an entry of zero. SS must have the same 
shape as the inventory and loss arrays, e.g., NA5 and YA5. 

The fourth option is to calculate the original scale 
attrition rate estimates using the APL function BINCONV in 
Figure D.38. Use the array R and the central inventory array 


for the aggregate being investigated. 
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Figure D.22 APL Function GETYG 
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APL Function GETANA 


Figure D.23 
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Figure D.24 
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Figure D.25 APL Function GETANG 
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Figure D.26 APL Function GETAAV 
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Figure D.27 APL Function GETACS 
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Figure D.32 APL Function SUMSQ 
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APPENDIX E 
RELATIVE SAVINGS LOSS 


l. THEORETICAL IMPLICATIONS OF RELATIVE SAVINGS LOSS 

The value of d can be empiricallv chosen using a concept 
developed by Efron and Morris [Ref. 5] called the relative 
savings loss (RSL). The RSL quantifies how well limited 


translation does versus James-Stein estimation. If 


(1) Rr = risk of the TSCA, 
(2) R, = risk of the James-Stein estimator, and 
(3) R = risk of the limited translation James-Stein 
estimator, 
then, 
RSL = (Rr ;-Ry)/(Rp-Ry). (E.1) 


RSL is the proportional increase in global loss if limited 
translation estimation is used instead of James-Stein 
estimation. 

Efron and Morris [Ref. 5] further state and prove a 


theorem: 
RSL = 2[(d“+1)(1-Q(d))-dq(a)] (E.2) 


where d and q are the standard normal c.d.f. and density 
function, respectively. By this theorem, RSL is a function 


of d only. Figure E.l graphs values of the RSL against d. 


2. APPLICATION OF RELATIVE SAVINGS LOSS 

A y Is a function of d, a vector of values of Ri 1 
can be calculated for d>0. This can yield a vector of 
values of the RSL. These values can be graphed against the 


values of d which generated the the RSL vector. Efron and 
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Morris [Refs. 5,6] state that, given the normality with 
common variance assumption, the relationship of the global 
risks of maximum likelihood, James-Stein, and limited 


translation will be 
Ry < Ry < Re: (E.3) 


Tables 4 thru 12 clearly show this is not the case when we 
identifv the validation FOM's as the components of equation 
BE A. 

Figures E.2 thru E.7 are the graphs of calculated RSL 
versus d for the studv cases. The encouraging results are 
misleading. Tables 4, 5, and 6 displav risk values that 
vield negative quantities in the numerator and denominator 
of equation E.2. Although the graphs are visuallv correct, 
the underlving computations are at variance with theorv, and 
are simply generating offsetting errors. Therefore RSL 


should not be used to choose d for the present aggregations. 
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APPENDIX F 
ROBUST PARAMETRIC EMPIRICAL BAYES ANALYSIS 


Robust parametric empirical Bayes (RPEB) estimation is a 
version of a generalized linear model (see McCullagh 
[Ref. 7]) combined with robust Bayes. It attempts to fit a 
probabilistic model that assumes the cell value p; for the 
itħ cell is a random variable that can be described with a 
probabilitv distribution, e.g., the logistic or the Poisson. 

Gaver and O'Muircheartaigh [Ref. 8] developed this tech- 
nique using the Poisson to model collections of similar 
objects that independently generate events in accordance 
with Poisson processes. First, the superpopulation parame- 
ters are estimated, using point and interval estimabese 
Next, the superpopulation parameter estimates are used to 
compute point and interval estimates of the individual rate 
parameters. The result is a Bayes estimate of Deg 
limited shrinkage if the inventory is small. 

Gaver has suggested, in the spirit of reference 8, using 
a logistic model for p; with explanatory variables for OF, 
LOS, and grade, and an extra binomial variance term to 
partly explain differences across cells. First, using numer- 
ical integration, the maximum likelihood function L(u,r) is 
used to find the values of u and r that maximize the likeli- 
hood function. Second, the expectationser p. Even the loss 
y; is calculated. This is the posteriam mean of p;, and is 
the Bayes estimator ofi ps, with limited shrinkage, 
especially if the inventory n; is small. 


Therefore, let 


Pi = e™/(1+e™) = e4/(1+e4) (F.1) 


120 


where 


m = Xif*ói 


q = u+re; (n) 

i, T. I 

0; ~ „Student t 

y; = losses from the ith cell 
El VE nO Ly Inte ith cell. 


Then the maximum likelihood function is 
L(usT) - [If (e4+7X/[(T+ek"7X) (1+(x/7)”/n) (9+1)/2] dx (F.2) 


where i = l,...,n, and the integration is performed over the 
real line. This likelihood function is used to find y and + 
that maximize L(u,r). 

RER NES SSSR andet y = yj, then the expected 


value of Pj às 


Elp:1 = {ps (x)p;¥(1-p;)* (F.3) 
c(n)/ [1* (x/z)?/n] * 1) /2ax, 


Ere —- ny. Elm) is the appropriate normalizing 
constant, and the integration is performed over the real 
line. This expected value is the empirical Bayes estimator 
of Pj. 

See Deely and Lindley [Ref. 9] for more on empirical 
Bayes. 
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